How much does vaccination reduce the rate of HBV infection in Iranian population? a Bayesian adjustment analysis.

Aim: The aim of this research was to estimate the changing rate of odds ratio (OR) by varying degrees of hepatitis B virus (HBV) underreporting. Background: Data registering is usually associated with extensive errors such as misclassification, under-reporting, missing data due to lack of co-operation, error prone factors, and in medical studies, inadequate diagnosis of physicians or low accuracy of laboratory tests. In the present study, which discuss the actual impact of vaccination on HBV prevention, exposure and response were prone to various errors. Furthermore, some people in the community are possibly infected to the virus while were not reported in the count of patients with HBV infection. Methods: This was a case control study. Cases included patients with HBV referring to the gastroenterology and liver disease research center. The control group included patients without HBV who underwent a fatty liver test at Taleghani hospital laboratory. Bayesian approach and Gibbs sampling algorithm were used to estimate OR. Results: According to results, misclassification rate was mild in raw data, but with an increase in degree of underreporting for 50 and 500 of unreported cases, OR increased by about half and more than double, respectively, while sensitivity diminished strikingly. Conclusion: Our analysis asserted that knowing the degree of underreporting is essential to accurately calculate OR and sensitivity. In addition, despite varying OR in different samples, overall the results were similar according to the pattern of exposure and response association.


Introduction
Hepatitis B is a viral infection and a common cause of liver disease and cancer (1,2). Hepatitis B vaccine vaccination program for hepatitis B virus for all newborns and high-risk groups began in 1993 in Iran. Since then, all newborn babies have been covered by the vaccination program and receive the vaccine in three stages at regular intervals: at birth, 1.5 months after birth, and 9 months after birth. Thus, the epidemiological pattern of the prevalence of hepatitis B virus has changed over time in Iran, and a reduction has been reported in both the acute and chronic infection of the virus (10). Iran is a medium-risk country with a prevalence of 2-7% and nearly 2 million adults with chronic infection. In another meta-analysis study (11), the prevalence of hepatitis B virus infection in the Iranian population was 2.2%. The highest prevalence of the virus was reported in Golestan province with 8.9% while the lowest in Kermanshah province with 0.7%. The prevalence of the virus was 2.9% before 2010, while after 2010 the prevalence dropped to 1.3%.
In clinical and epidemiological studies, to investigate the association between a binary exposure and response, the nature of these types of studies can often lead to misclassification due to the low accuracy of diagnostic tests. Because of the close association between the accuracy of the test and its sensitivity as well as specificity (12), misclassification is defined as a function of sensitivity and specificity for exposure and response. One of the most important reasons for misclassification is the lack of a gold standard. Thus, by changing the definitions of classification, misclassification occurs. When clinical and laboratory data are available, this issue can be mitigated. Thus, clinical researchers can reduce the classification error according to the data. On the other hand, when data is not available, statistical techniques can be used. One of these techniques is the Bayesian method (13). In a study that is prone to misclassification for both exposure and response; non-differential response correction can be matched by three models using the hierarchical Bayesian approach. Since the present study is a public vaccination and has been proceeded for a few decades, vaccination information may have a recall bias, and HBsAg marker can be associated with a slight error. Consequently, misclassification and unreported cases affect HBV sensitivity and Odds Ratio.
We applied Bayesian adjustment to obtain the real effect of the vaccination on hepatitis B virus and to estimate the validity as well as accuracy of the intensity of effect despite misclassification error and underreporting cases.

Source and study population
This was a case control study in which both cases and controls were prone to misclassifications. Cases include those suspect to have hepatitis B virus referring to the gastroenterology and liver disease research center at Shahid Beheshti University of Medical Sciences by its vice chancellor in health affair. The control group included those who underwent a fatty liver test at Taleghani hospital laboratory and they were not suspect to carry hepatitis B virus.

Modeling
Exposure variable of the present study was being up to date in 3 times of communal vaccination and response variable was hepatitis B virus infection. Correction of exposure and response values was performed based on the assumption of exposure and response misclassification in accordance with the exposure misclassification approaches of Gustafson (14) and Luta et al. (15) via three models. These three models include exposure model (equation 1), measurement model (equation 2), and response model (equation 3), adjusted for appropriate covariates (in related studies (16)(17)(18)(19), risk factors associated with HBV were selected) as follows:

Priors/Validation data
In order to implement the analysis in the Bayesian framework, it is necessary to specify the distribution of the parameters and the prior probabilities. In the exposure and response models, we used independent informative normal priors for the intercepts, covariates, as well as the up-to-dateness in the hepatitis B vaccine. Also, for the sensitivity (SN) and specificity (SP) in the measurement model, as prior distributions, it is required to calculate the number of reported cases and noncases. We let A represent a true positive, B a false positive, C a true negative, and D a false negative. Also, E and F represent the number of unreported cases and controls in the remaining population, respectively. The values of the sensitivity priors changed subsequently with varying several degrees of underreporting, though the specificity remained almost perfect. Thus, for sensitivity and specificity, we used beta distribution with appropriate parameters as follows: for sensitivity, beta (A, C + E); and for specificity, beta (D + F, B). Precision estimates are provided as 95% confidence intervals and 95% credible limit ratios to facilitate comparisons.

Bayesian framework
In order to correct the values of exposure and response, we first consider correction of exposure values by the exposure model. The imperfect response values are also corrected using the beta distribution for sensitivity and specificity in the measurement model. Since the Bayesian inference of the posterior equation in logistic models (equations 1 and 3) will be in a complex form and create multi-dimensional density functions (20), the inferences were performed via Morkov Chain Monte Carlo simulation and Gibbs sampling algorithm.

Resampling
The present study population reflects the situation of Iran on a small scale (0.01% of the total Iranian population). Indeed, Iran is in the middle risk zone with a prevalence of about 2%. The prevalence of HBV infection was estimated to be 2.14 and 2.7 in the review studies by Alavian et al. (21) and Porolajal et al. (22), respectively. Therefore, assuming a study population of 8000, a sample of 300 was taken from the 3000 available data. Then, in order to determine the impact of unreported cases on sensitivity and subsequently the odds ratio, we changed the unreported cases to 0, 5, 50 and 500. For more accurate investigation of the pattern of changes in OR and misclassification rate and also to avoid random occurrences, the results were interpreted based on three-time resampling. Thus, at each step of the simulation, using 1000 Gibbs iterations and excluding 50% of the initial observations for burn-in, the correct state of vaccination and imperfect measure of HBV were imputed from the exposure and measurement model and regressed in the outcome model to form a posterior distribution of the estimates of the corrected odds ratios. Empirically, only simulations that improve the likelihood were retained to estimate the posterior distribution. The analysis was performed using R, version 3.5.1.

Demographic changes of available data in the present study
The frequency distribution of the 2000 reported cases and 1000 reported controls is reported in Table 1. Overall, 79% of the population were not up-to-date on their vaccination (n=2364), of whom 72% were cases (n=1694). Also, about half of the population was female (50%), mostly over 30 years (79%) and were married (86%). Compared to the control, most cases were from other ethnicities (65% vs. 50%), noncupping (82% vs. 64%), and non-surgical (52% vs. 42%). Nevertheless, there was no significant difference for the incidence of smoking (18% vs. 13%) and alcohol use (5% vs. 4%).

Prior specification of HBV sensitivity and specificity: Sample One
Of the 8,000-person population, 300 were reported cases and controls, while the remaining 7700 were nonreported in the community. Due to the lack of confirmed laboratory and clinical data for correcting classification of cases and controls, based on the simulation framework and knowledge to a low error rate of laboratory tests, 197 were correctly identified as true case patients with 5 false positives while 96 were correctly identified as true non-case patients with 2 false negatives. Assuming a complete case report (E = 0, no underreporting), this corresponds to 99% sensitivity and near perfect specificity. Afterward, by altering the degree of underreporting (E = 5, 50, 500), sensitivity diminished to 97%, 79%, and 29%, respectively ( Table 2). In addition, the existence of valid data and unnecessary need for expert opinion creates the following set of priors: , , Prior specification of HBV sensitivity and specificity: Sample Two As in the first example, Of the 8,000-person population, including 300 reported cases and controls, 181 were correctly identified as true case patients with 7 false positives while 106 were correctly identified as true non-case patients with 6 false negatives. Assuming a complete case report (E = 0, no underreporting), this corresponds to 97% sensitivity and near perfect specificity. Afterward, by varying the degree of underreporting (E = 5, 50, 500), sensitivity declined to 94%, 76%, and 26%, respectively (Table 3). In addition, we have

SP〗_(E=500)~beta(7306,7) Prior specification of HBV sensitivity and specificity: Sample Three
As in previous examples, Of the 8000-person population, including 300 reported cases and controls, 197 were correctly identified as true case patients with 4 false positives while 95 were correctly identified as true non-case patients with 4 false negatives. Assuming a complete case report (E = 0, no underreporting), this corresponds to 98% sensitivity and near perfect specificity. Afterward, by varying the degree of underreporting (E = 5, 50, 500), sensitivity dropped to 96%, 78%, and 28%, respectively (Table 4). In addition, we have the following set of priors:

Bayesian Adjustment for correcting bias
In the first to third samples, the odds ratios were estimated with raw data. These estimates serve as a basis for comparison, with values of 3.14, 3.51, and 3.37, respectively. After correcting misclassification using the Bayesian approach and the Gibbs sampling algorithm, ideally (i.e. E=0), OR increased to 3.22, 3.75, and 3.54, respectively. As can be seen in Table 5, as the degree of underreporting increased, the adjusted OR increased dramatically. In addition, the credible intervals ratios in the corrected state were smaller than in the misclassification state. The autocorrelation functions and time series graphs of β1 estimates in the first to third samples for various degrees of underreporting were obtained using 1000 Gibbs  iterations and 50% burn-in. In the following, the graphs related to the first sample are presented.

Discussion
The results of the current research indicated that the misclassification rate was mild in the raw data. The odds ratio increased by only a few percent when only 5 persons of rest of the population in the study population were infected with the virus. On the other hand, for 50 and 500 persons of rest of the population, OR estimates increased by about half and more than double, respectively. Another result of our analysis was that despite the varying OR in the samples, the overall results (misclassification rate in raw data, magnitude of increased effect size in scenarios, etc.) were identical according to the pattern of association (association of exposure and response).
In the present study, there are several reasons for the misclassification of vaccination exposure. Indeed, a group of population may be hesitant to receive the vaccine and over time they experience a recall bias. Another group may not be up to date on the vaccination and has received the vaccine less than three times. Another group may have received the vaccine after being infected with hepatitis B virus, in which case the injected vaccine is dysfunctional on the prevention of the virus. Also, HBV infection is diagnosed with serologic markers including HBsAg, which may be associated with a slight error due to inadequate accuracy of laboratory test which leads to the misclassification of patients with Hepatitis B virus and as a result biased the odds ratio in the present research.
Underreporting of HBV and HCV has long been recognized as a challenge. In the study by Robert et al. (23), the accuracy of state surveillance case registries for recording clinically-confirmed cases of HBV and HCV infections was investigated in few reports. The study indicated that chronic hepatitis B and C went unreported in Pennsylvania Department of Health (PDoH), as compared to patients in the Geisinger Health System of Pennsylvania. In addition, 28% underreporting of HBV and HCV co-infections in the study may lead to biased results about estimation of liver cancer risk (24)(25)(26). Another study tested the completeness of report of clinically-confirmed cases of chronic HBV and HCV infections in Michigan, which was unreported with 18% for HBV and 35% for HCV. The difference of unreported cases with demographic characteristics in HBV infection was significant only in the year of initial diagnosis (27). In a similar study by Goldstein et al. (13) on the impact of vaccination on pertussis disease that was prone to both exposure and response error, the odds ratio and subsequent vaccine efficacy (VE) were corrected; compared to the misclassification state, such as our analysis, no significant difference was observed. Also, varying the degree of underreporting resulted in a change in sensitivity from 90% to 20% and VE estimate from 57% to 82%, while in our study, OR estimates was greater with varying the degree of underreporting. Perhaps one reason is because of the inherent association between vaccination and hepatitis B virus. In another study, Luta et al. (15) in the state of misclassification and missing data biases, with a Bayesian method and similar to our study method, compared four models: A model composed of misclassification and missing data biases at the same time, two models each with one of these biases, and a one raw model. They found that despite the different odds ratios across the four models, overall results were similar with respect to the pattern of associations.
To the best of our knowledge, there was no study examining misclassification of HBV infection rate despite public vaccination. Also, most of the studies investigated the effect of misclassification on either exposure or outcome; however, our study simultaneously evaluated the effects of misclassification on both exposure and outcome through the model and compared the resulting effects. One of the limitations of the current research was lack of data on proper injection of HBV dose for different ages as well as adherence to the cold chain (28). Secondly, for convenience, we assumed nondifferential misclassification. Possibly, by varying the degree of underreporting, cases might have been grouped at an older age causing differential misclassification. Finally, due to the lack of a similar study accurately analyzing our method and subject matter, we were forced to use priors of similar studies. Nevertheless, prior competencies were confirmed during the implementation of the program. Our study indicated that in retrospective studies of the vaccine and a specific disease/infection associated misclassification error, the actual magnitude of the association has become illusory and the results are underestimated with increasing unreported cases. One solution to this problem is the Bayesian method. In addition, although data from this study were obtained from Taleghani Hospital in Tehran, patients had participated in this study from different cities of Iran. Thus, we can think about generalizing the results to the Iranian society. Finally, the method used in this research can be applied to any arbitrary exposure and response that is prone to misclassification error.